different attention module
Auto Learning Attention: Supplementary Material
The initial learning rate is 0.1, and The weight decay is set as 0.0005. The batch size is 256. The results are summarised in Table 3 of the paper. The learning rate starts from 0.1 We replace it with ResNet50 to evaluate the performance of different attention modules. The conv5_x, average pooling, fc, and the softmax layers are removed from the original classification model.
Auto Learning Attention Benteng Ma
Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially. However, designing the structures of different attention operations requires a bulk of computation and extensive expertise.